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Abstract. The ability to automatically generalise (interactive) proofs 
and use such generalisations to discharge related conjectures is a very 
hard problem which remains unsolved. Here, we develop a notion of goal 
types to capture key properties of goals, which enables abstractions over 
the specific order and number of sub-goals arising when composing tac- 
tics. We show that the goal types form a lattice, and utilise this property 
in the techniques we develop to automatically generalise proof strategies 
in order to reuse it for proofs of related conjectures. We illustrate our 
approach with an example. 

1 Introduction 

When mechanising large systems, either mathematical theorems or within formal 
methods, one often ends up applying the same strategy many times - albeit 
with small variations. An expert developer of a theorem proving system may 
implement such a strategy as a tactic, however, non-developers often need to 
manually prove each conjecture. In order to support analogous reasoning, one 
needs to generalise a proof into a sufficient strategy which abstracts over the 
variations between two proofs (the source and target), whilst still capturing the 
key properties. 

To illustrate, we will use a running example from a subset of separation logic 
[21] . used to reason about pointer-based programs. In the subset, there are two 
binary operations * and A and a predicate pure, with the following axioms: 



Now, consider the conjecture: 

p : pure(e), h : c* ((/ * (d * b) A e) A e) * a h (((c * / A e) * d A e) * b) * a (1) 

This can be proven as follows: apply the axl substitution four times, which gives 
the goal g : c* ((/Ae) * (dAe) *b) *a; then apply the ax2 substitution, discharging 
the first subgoal by p twice; finally, discharge by h. The shape of the proof can 
be seen (left-most) in Figure [TJ 

When developing such a proof interactively, there is large amount of infor- 
mation the user utilises to guide the proofs, including: 



(A* B) * C <^ A* (B * C) 
pure(B) -> (j4AB)*Co(A*C)AB 



(axl) 
(ax2) 



— The conclusion to be proven, (((c * f) * (d A e)) * b) * a initially. 

— The facts available, including local assumptions (such as p and h), and ax- 
ioms/lemmas (e.g. axl and ax2). 

— Definitions, e.g. of * and pure (not present above). 

— Any fixed/shared variables (not present above). 

— Properties between facts and the conclusion (or other facts). E.g. ax2 is 
applied because the condition of it can be discharged by p. 

— Information between proof states/steps. E.g. we move the conclusion towards 
the h assumption, thus each step has to reduce this "distance". 

— Properties relating goals to tactics; e.g. after applying ax2 one subgoal is 
discharged by p but not the other. 

The "key properties" which capture a generalised proof strategy must be able 
to use such information - thus, the underlying representation of a proof strategy 
needs the ability to capture it. 

We argue that this information cannot be naturally expressed in tactic lan- 
guages. For example, one cannot distinguish sub-goals, thus composition of tac- 
tics depends on the number and order of them. The problem is that both declara- 
tive and procedural features are required to be able to both generalise and replay 
proof strategies. In previous work jS], we developed a graphical proof strategy 
language, where the tactics are on nodes, and the edges describe the goals, using 
a goal type. This paper extends this work in two ways: 

1. We develop a theory of goal types in ^3] and show how it relates to evaluation 

in g 

2. Using the goal type as a building block we develop techniques to generalise 
a strategy to enable reuse in fj5| 

In the next section we give background on the underlying graph based language 
to write strategies, while we discuss related work and conclude in f|6]and Sj7] 

2 Background on the Proof Strategy Language 

The graphical proof strategy language was introduced in [5] built upon the math- 
ematical formalism of string diagrams [5] . A string diagram consists of boxes and 
wires, where wires express the composition of boxes. Moreover, wires are allowed 
to be free (i.e. not connected to a box) at one or both ends. These free wires are 
used to express inputs and outputs of the diagram, and forms the boundary of 
a diagram. Wires are labelled with data, which provides a type safe mechanism 
for composing and rewriting diagrams. To mechanise the theory, there is a more 
formal notion of string graphs, however these details are elided here, and we refer 
to |5|8j for more details. 

In a proof strategy graph [5] , the wires are labelled with goal types, which is 
developed in the next section A vertex either contains a tactic or a list of goals, 
where the goal nodes are used purely to evaluate/prove a conjecture using the 
strategy. This is achieved by propagating goal nodes toward the output wires 
using graph rewriting: 
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For a type r, let [r] be the type of finite lists and {r} be the type of finite 
sets whose elements are of type r. One type of tactic is an atomic tactic, which 
corresponds to a tactic of the underlying theorem prover, which we here assume 
works on a proof state (containing named hypothesis, the open conjecture, fixed 
variables etc). A tactic can then be seen as a function 

proof state — > {[proof state]}, 

which produces a set of possible subgoals required to be proven. For this paper, 
we assume two standard tactics: subst (arg) and rule (arg), which performs a 
single substitution or resolution step, respectively, (arg) may be: a single rule, a 
set of rules; or class (description of a set of rules which we return to in the next 
section). The use of set/class is the same as encapsulating each member of it in 
an OR combinator. In order to apply an atomic tactic in the strategy language, 
it has to be typed with goal types. Let a and /3j represent goal type variables. A 
typed tactic is then a function of the form: 

a -►{[&] x [f3 2 ] x ... x [P n ]} 

This type properties has to be reflected by the goals on a graph, and (un-)lifting 
capabilities between goals and proof states has to be provided. This is the topic 
of SI 

Another type of tactic is a graph tactic, which is a node containing (nested) 
proof strategy graph(s). Here, a single graph is simple a hierarchical view, while 
multiple graphs (which must have the same boundary) represent branching in 
the search space. These are evaluated recursively (see [5] for details) but can also 
be unfolding by rewriting, as illustrated by X (one graph) and Y (two graphs, 
meaning two ways to unfold): 



x 






3 Towards a Theory of Goal Types 



Classes A goal type must be able to capture the intuition of the user, potentially 
using all the information listed in EjXJ This information is then used to guide the 
proof and send sub-goals to the correct tactic. To achieve this we firstly need 
to capture important properties of the conclusion of the conjecture. Next, it 
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is important to note that in general, most of the information available is not 
relevant and inclusion of it will act as noise (and increase the chance of "over- 
fitting" a strategy to a particular proof). Thus, we need to be able to separate the 
wheat from the chaff, and capture properties of the 'relevant ' facts, where facts 
refer to both lemmas/axioms, and assumptions which are local to the conclusion. 
Henceforth we will term a fact or conclusion an element. There are a large set 
of such properties, e.g.: 

— a particular shape or sub-shape; 

— the symbols used, or symbols at particular positions (e.g. top symbol); 

— certain type of operators are available, e.g. ([I]) contains associative-commutative 
operators; 

— it contains variables we apply induction to or (shared) meta-variables; 

— certain rules are applicable; 

— the origin, e.g. it is from group theory or it is a property of certain operator. 
This list is by no means complete, and here we will focus on two such properties: 

— top.symbol describes the top level symbol; 

— has_symbol describes the symbols it must contain. 

Each such feature will have data associated: 

data :— int \ term \ position \ boolean 

where term refers to the term of the underlying logic, and a position refers to 
an index of a term tree. A class accounts for a family of elements where certain 
such features hold: 

Definition 1. A class is a map 

class := name [[data]] 

such that for each name in the domain of a class, there is an associated predi- 
cate on an element, termed the matcher. There are two special cases where the 
predicates always succeeds or always fails on certain data, denoted by T f and 
.Lf as described below. A class matches to a conclusion/ fact if the predicate of 
each maplet holds. 

The intuition behind the list of lists of data is that it represent a property 
in DNF form, e.g. [[a, 6], [c]] can be seen as (a A 6) V a For the conjecture in 
{ (top Jewel H> [[*]]), (has symbol H> [[*,A]])} identifies the conclusion, while 
{(hassymbol h-> [[pure]])} identifies the first assumption, but not the second, and 
{(hassymbol i— ► [[pure], [*]])} captures both assumptions and the goal. 

We write the constant space of names as N and, for a class C, with n G Af, 
C(n) is the data associated with feature n for class C. We define a semantic 
representation of the data for a particular feature in a class using the notation 
X s for some data x. It is then possible to reason about this data. For example, 
for the feature hassymbol in the conjecture ([I]) we write for C(hassymbol) s : 

[[oi • • • a m ], ■..,[&!••. M s = (([<*i] n • • • n W) u ■ ■ ■ u (IM n ■ • • n [M)) (2) 
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where [a] denotes a as an atom. 

Classes form a bounded lattice (C, V, A, T, _L), on which we can define a meet 
and a join. We show how to compute the join (A: least upper bound), and meet 
(V: greatest lower bound) for two classes C\ and C2. We define the most general 
class as T and the empty class as _L We write the most general element of C(f) 
as T f and the least general to be _L/. 

Definition 2. C\l\Ci is the greatest lower bound of C\ and C2 ifVn £ Af.(C\ A 
C2X") = Ci(ra) A n C2 (re), where A n computes the greatest lower bound for feature 
n. 

Definition 3. C\ V C2 is the least upper bound of C\ and C2 if Vn £ Af.(C\ V 
— Ci( n ) V„ C2(n), where V„ computes the least upper bound for feature 

n. 

For / = topsymbol or / = hassymbol we define A / and V / as: 

Definition 4. A/ C 2 (/) := d(/) s n C 2 (/) s «W C x (f) V f C 2 (f) := 

Ci(/) S UC 2 (/) S 

We further define and _L^- to be U and respectively. To show that classes 
form a partial order, we prove the following properties about meet and joint: 

Theorem 1. A and V are commutative and associative operations. 

Proof. It suffices to prove that A / and V / commutative and associative for each 
/ £ N ' ■ In our example we use Definitions [4] and [3] This is provable since n and 
U are commutative, associative and idempotent operations in set theory. 

Theorem 2. A and V follow the absorption laws aV(aAb) = a, andaA(aVb) = 
a. 

Proof. It suffices to prove that A / follow the absorption laws. This follows from 
the fact that n and U are set theoretic operations. It also follows that A and V 
are idempotent; aAa = a, aVa = a. 

Since _L is and T is U, it is trivial to show that CVl = C and C A T = C for 
a class C . Thus, a class form form a bounded lattice. 

Orthogonality is a key property to reduce non-determinism during evaluation 
of a strategy, whilst subtyping of classes is a key feature for our generalisation 
techniques discussed in ^ 

Definition 5. C\ and C 2 are orthogonal if 3f £ Af.Ci(f) A C 2 (/) = -1/. We 
write this as C\J-C2- C\ is a subtype o/C 2 , written C\ <: C 2 , */V/ £ Af. Ci(/)A 

C 2 (/) = <?!(/). 

As an example, consider a goal class with features hassymbol and topsymbol: 

C\ : {{top_level^r [[*]]), (hassymbol H> [[*, A], [V, *]])} 

C 2 : {(top-level t-> [[A]]), (has_symbol i-> [[*, A], [V, *]])} (3) 

C 3 : {(top-levels [[*]]), (hassymbol^ [[*, A, V]])} (4) 
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C2-LC3 as there is a feature (top symbol) for which C2(/) J-fCs(f), since by the 
semantics [A]fl[*]=0. In order to determine whether C\ is a subtype of C 3 
we must show that (C\(f) A C 3 (f)) = C 3 (/) for all features. Using definition [4] 
we must prove for hassymbol: 

(([*]n[A])u([v]n[*]))n([*]n[A]n[v]) = ([*]n[A]n[v]) 

which is true and the same for topsymbol which in this case follows trivially. 

Links A class identifies a cluster of elements with certain common properties. 
However, certain types of properties are between elements - e.g. a conditional 
fact can only be applied if the condition can be discharged - which relies on other 
facts. Moreover, certain properties rely on historic information, e.g. a measure 
has to be reduced in a rewriting step to ensure termination. Such properties 
include; 

— common symbols between two elements, or the position they are at; 

— common shapes between two elements; 

— embedding of one element into another; 

— some form of difference between elements 

— some sort of measure reduces/increases between elements; 

We call such properties a link. Moreover, we abstract such properties to make 
them relations between classes rather than between elements. This should be 
given an existential meaning: a link between two classes entails that there exists 
elements in them such that this holds. In addition, we introduce a parent function 
on links to refer to the parent node. The meaning of this will become clearer in 
the next section, where we discuss evaluation. 

Definition 6. A link is a map 

link := name x class x class ^% [[data]] 

such that for each name n in the domain of a link, there is an associated predicate 
n : [[data]] x element x element — > B called a matcher. A link matches to a 
conclusion/ fact if the predicate of each maplet holds. 

We write the constant space of link names as Ml and for a link L, with n 6 TVl, 
L(n, Ci, C2) is the data associated with feature n, classes C\ and C2, for link L. 

We will only consider the link features is match and symb at pos for this 
exposition. The data of the former is booleans in DNF, and it's matcher succeeds 
if the result of an exact match between the elements is the same as the semantic 
value of the data. The data of the latter is lists of position, where for example 

{(symb_at-pos,Ci,C2) >— > [[pos])} 

states that there exists elements of classes C% and C2 where the symbol at 
position pos is the same. To state that there are no position where this is the 
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case, we introduce an element _L/ for each / G Ml-, as we did with classes. In 
general there will be more complicated link, with more complicated output data 
values. Defining these is ongoing work. 

In order to define orthogonality and subtyping we define the meet and join 
for each name in Ml ■ 

Definition 7. L1AL2 is the greatest lower bound of L\ and L2 i/VYi G Cn-{L\/\ 
L 2 ){n) = Li(n) A n L 2 (n), where A„ computes the greatest lower bound for link 
feature n. 

Definition 8. L\ V L2 is the least upper bound of L\ and L2 ?/ Vn G Cn-(L± V 
L 2 ){n) = Li{n) V„ L 2 (n), where V„ computes the least upper bound for link 
feature n. 

As with classes, we introduce a semantic representation for the links using no- 
tation X s for some data x. Since the data is a list of lists of positions, we use the 
same semantics as in ([2]). The intuition is that we should be able to generalise 
the notion of the statement to say that various possibilities exists in a link for 
positions where symbols that should be the same in two classes. The proofs and 
definitions of the lattice theory follow similarly to those for classes. 
We then define orthogonality and subtyping for links: 

Definition 9. L\ and L2 are orthogonal if3f G CN-Li(f) A ^(J) = _L/. We 
write this as L1-LL2 L\ is a subtype 0/L2, written L\ <: L2, i/V/ G £n- L\(f)A 
L 2 (f)=L 1 (f). 

Goal Types A goal type is a description of the conclusion, the related facts, 
and the links between them: 

Definition 10. A goal type is a record: 

GoalType :— { link : link, facts : { class } , concl : class } 

where concl is the class describing the conclusion of a goal, facts is a set of 
classes of relevant facts, and link is a link relating classes of facts and concl. 

Note that we keep a set of classes of facts since some related facts forms a 
class and some do not. For example, in the our example conjecture, hypothesis p 
forms a class P (with topsymbol pure), while h forms a class H (with hassymbols 
[[A, *]]). Henceforth we assume that all members of facts are orthogonal - dealing 
with the general case which allows overlapping is future work. Orthogonality 
and subtyping of two goal types reduces to orthogonality of their respective 
classes. Due to the assumptions of orthogonality between the facts, they have 
an universal interpretation for _L and an existential interpretation for <: 

Gi _L G 2 := diconcl) _L G 2 {concl) V G^link) _L G 2 (link) V 

V/i G G 1 (fact),f 2 G G 2 (fact).f x L f 2 
Gi <: G 2 := Gi(concl) <: G 2 (concl) A Gi(link) <: G 2 (link) A 

3h G G 1 (fact),f2 G G 2 {fact).h <: f 2 
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4 Lifting of Goals and Tactics 




The following meta-rule and procedure, described in [5], formalises one evalua- 
tion step in a proof strategy: 

1. Match and partly instantiate the 
LHS of the rule (on the right). 

2. Evaluate the tactic function for the 
matched input and output types. 

3. Finish instantiating the RHS with 
the lists gsi from the tactic. 

4. Apply the fully instantiated 
rule(s). 

where a and ft are goal type variables. We assume t is an atomic tactic (see [8] 
for graph tactics). In the second step, the underlying tactic has to be lifted from 
proof state — ¥ {[proof state]} to the form a —> {[Pi] x [ft>] K ... X [ftj]}. 
Firstly, a goal is an instance of a goal type for a particular proof state: 

Definition 11. A goal is a record: 

goal := {fmap : class —¥ {fact}, ps : proof state, parent : {goal}} 

where parent is either a singleton or empty set, empty if this is the first goal. 
Type checking g relies on the "typing predicates" associated with classes and 
links. A goal g is of type G, iff 

— The conclusion in g(ps) matches G(concl). 

— For each class c G G(facts), g(fmap)(c) is defined, not empty, and each 
/ € g(fmap)(c) matches c. 

— For each (l,c\,C2) i— > d G G(links) there exists elements e\ € g(fmap)(ci) 
and ei £ g(fmap)(c\) such that the l(d,e\,e2) holds. Moreover, for each 
e i <= 9(f ma P)( c i) there must be an ei € g( fmap) (02), such that l(d,e\,e2) 
(and dually the other way around). 

Now, to lift a tactic we need to: unlift goal g to project the underlying proof 
state; apply the tactic; and lift the resulting proof states to goals of a type in 
. . . , [3 n } (which becomes instantiated to specific goal types when matching 
the RHS in the first step). Then, for a list L of proof states, let lp{(3\, . . . , f3 n ; L) 
be the set of all partitions of L lifted into n lists of goals {(map lift L\), . . . , 
(map lift L n )}, such that all of the goals in the i-th list have goal type ft. Then, 
we define lifting as: 



lift(tac) = Xg. 



( lp(fli, . . . , (3 n ; tac(unlift(g))) if g is of type a 
y otherwise 



We are then left to define unlifting and lifting for a single goal node and a single 
goal type. Firstly, a naive unlifting of a goal simply projects the goal state. 
More elaborate unliftings are tactic dependent, and may e.g. add all facts from 
a particular fact class as active assumptions beforehand. 
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Lifting is a partial function, and an element of Ip is only defined if lifting of 
all elements succeeds. There are several (type-safe) ways to implement lifting. 
Here, we show a procedure which assumes that all relevant information is passed 
down the graph from the original goal node. Any fact "added" to a goal node is 
thus a fact generated by the tactic. However, one may "activate" existing facts 
explicitly in the tactic which will then be used by lifting. A new goal <?' is then 
lifted as follows, using the (new) proof state ps', previous goal g, and goal type 
G as follows: 

1. Set fields g' '(parent) to g, and g'(ps) to ps' , fail if the conclusion does not 
match G(concl). 

2. For each c G G(facts), set g' (facts)(c) to be all facts in the range of 
g(facts) and newly generated facts which matches c. If for any c G G(facts), 
g'(facts)(c) is empty (or undefined) then fail. 

3. Check all link features. For each c G G(facts) which is used by a link feature, 
filter out any element e G g' ' (facts) (c) not "captured" by a link related link 
match. Fail if there does not exist an element in the related classes which 
holds for any of the links or any g' (facts)(c) (for c G G( facts)) is empty 
after this filtering step. 

Deriving Goal Types from Proof States. The details of turning a proof 
written in e.g. Isabelle/Isar into an initial strategy which we generalise is beyond 
the scope of this paper. For the next section, it is sufficient to know that we 
have kept the shape of the proof tree resulting from the proof of ([I] However, we 
have generalised proof states (on the edges) into goal types, and where possible, 
generalised tactics to refer to a class and not the named assumption (e.g. H 
instead of h). We have attempted a "locally maximum" derivation of goal types, 
where each assumption has become a class (using the two features) , and the most 
specific class has been used that is not T. The resulting tree is shown left-most 
of Figure [T] For space reasons we have not included the goal-types, but provided 
a name when referred to in the text. Discussion of capturing the proof (process) 
[23] is referred to eJtJ 

5 Generalising Strategies 

The description of goal flow in terms of goal types, as shown in Figure [I] (left- 
most) is already a generalisation of a proof tree, since the goals can vary slightly. 
However, this is still low-level - e.g. relying on the exact number of application 
of each tactic. Here, we describe techniques to further generalise the strategy, 
achieved by generalising sub-parts of the graph. In particular, we focus on intro- 
ducing loops when sequences are spotted. This often requires goal type general- 
isation, described first, as well as generalisation of (graph-)tactics. One impor- 
tant property when performing such generalisations, is that any valid proofs on 
a strategy should also be valid after. If possible, this is something which should 
be shown statically. In our examples, we will only provide informal justifications 
for this property, and we will only deal with partial correctness. 
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Fig. 1. Steps made to generalise proof of Q 



Generalising Goal Types In the context of goal types: generalisation refers 
to computing the most general goal type for two existing goal types; while weak- 
ening applies to only one goal type and makes the description of it more general. 
Crucial to both generalisation and weakening is that multiple possible gener- 
alised and weakened goal types exist. 

We use the notion of a least upper bound for a goal type lattice, described in 
fusing the join operator V, to define generalisation for goal types. For a class 
C, we write: 

Definition 12. C is a generalisation of C\ and C<i, written C — gren(Ci, C2), 
z/V/GAr.C(/) = Cx(/)vC 2 (/). 

As an example, consider the two classes shown in ^ and Q. We can compute 
G = gen(C\ 1 C2) by appealing to the set theoretic semantics and tranferring 
back to the class representation. For fi = top_symbol and f2 = hassymbol we 
compute 

C(/i) s = (lAjUl*J) -C(/i) = [[A],[*]] 

^(/ 2 ) s = ((I*Jn[Aj)u(IvJnI*J))u(I*JnlAjnIvJ) 

= (([*Jn[A|)u([v|n[*])) -c(/ 2 ) = [[*,a],[v,*]] 

producing a generalised class: 

C : {(top.level^ [[V], [*]]), (has^symbol H> [[*, A], [A, V]])} 

The definition of generalisation for links extends similarly from its associated lat- 
tice theory described in Sj3l Recall that we assume orthogonality of fact classes. 
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Fig. 2. Generalisations: left/middle: loops rules; right: push-out over common subgraph 



We define a function gen.map over two sets of (fact) classes, which generalises 
pairwise each fact class. Here, for any two fact classes Hi and H2 in the gener- 
alised set of fact classes, where (Hi <: H2) A _L H2) we only retain H2, 
thus ensuring orthogonality. We can then define a function gen on goal types to 
be 

gen(Gi, G2) '■= { concl = gen(Gi(concl),G2(concl)), 

facts — gen_map (G 1 (f acts), G2 (facts)), 
link — gen (G \ (link), G '2 (link)) 

Loop Discovery The ability to identify repeated sequential applications of 
the same tactic, and turn this into a loop, is a key way of generalising, since 
it enables abstraction over the exact number of times a tactic is applied. When 
working in a standard LCF tactic language, the problem is to know: (a) which 
goals (in the case of side conditions) the tactic should be repeated on, and (b) 
when to stop. This was highlighted in [S], where a regular expression language, 
closely aligned with common LCF tacticals, was used to learn proof tactics, and 
hand-crafted heuristics were defined to state when to stop a loop. 

The advantage of our approach, is that we can utilise the goal types to identify 
termination conditions - reducing termination and goal focus to the same case, 
thus also handling the more general proof-by-cases paradigm. We illustrate our 
approach with what can be seen as an inductive representation of tactic looping, 
as shown by rule loopl and loop2 of Figure [2j Tactic generalisation is discussed 
next and can be ignored for now, since we can assume that genit, t) = t. For 
loopl, we can see that it is correct since BJlC ensures that a goal will exit the 
loop when it matches C. Moreover, the B <: A pre-condition ensures that the 
tactic can handle the input type. For loop2, similar arguments holds for the 
generalised gen(B,B') edge. 

Consider the left most graph of Figure [T] Here, the stippled box highlights 
the sub-graph which matches with the rules shown above, loopl is applied first, 
followed by two applications of loop2. The classes are identical so we only discuss 
link classes, which have the following values: 

GTl(link) = GT2 n (link) = {(symb_at_pos, G, H) h> [[!_]]} 
GT3(link) = {{symb.at.pos,G,H) ^ [[1]]} 

where GT2 n denote the goal types in the intermediate stages of the repeated 
application of tactic subst axl. Now, for the sequence to be detected as a loop, 
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we must first discover 

GT2' = gen(gen(GT2 1 ,GT2 2 ),gen(GT2 3 ) = GT2 1 

and show GT2' <: GTO and GT2'_LGT3. These are both true since GT2' and 
GTl are equal, and GT2' and GT3 are orthogonal due to the existence of _L in 
the data argument denoting an empty feature. 

Generalising Tactics As well as generalising goal types, we can generalise 
tactics. A simple example of this is when sets of rules are used as arguments for 
the subst and rule tactics. Here, subst R\ and subst R 2 can be generalised into 
subst (Ri U R2). It is straightforward to see that this generalisation preserves 
any proofs from the definition of the tactic evaluation in Sj3j albeit possibly 
introducing more search. 

Graph tactics can also be generalised, however, this means combining two 
graphs into one. As shown in [5], in the category of string graphs, two graphs are 
composed by a push-out over a common boundary. We can combine two graph 
tactics in the same way by a push-out over the largest common sub-graph. Note 
that this is likely to change the boundary of the graph, so it is only valid under 
certain circumstances. 

Step s2 of Figure [T] layers the highlighted sub-graphs into the graph tactics 
pax2a and pax2b. Such layering can be done for a (connected) sub-graph if the 
inputs and outputs of the sub-graphs are respectively orthogonal. Now, if pax2a 
and pax2b can be generalised, then we can apply loopl, provided the (generalised) 
graph associated with the tactic has the same boundary as specified by the 
(instantiated) RHS of the loopl rule. As shown on the right-most diagram of 
Figure [2j this is exactly what the push-out approach provides, and this step 
becomes s3 of Figure [T] 

A Mutation Proof Strategy The right-most graph of Figure [T] shows the 
result of the generalisations we have applied. It can be read as 

— continue applying axl while {(symbjit_pos,G,H) t— > [[J-]]} 

— then apply ax2 while {(i s Jinatch ,G,H) i— >• [[/aZse]]} 

— finally, discharge the goal with the H class. 

This is in fact a version of the mutation proof strategy developed to reason 
about functional properties in separation logic pT]. Now consider the following 
"similar" conjecture: 

p' : pure(d), h' : a * (((b * c) A d) * e) h ((a * ((b Ad)* c)) * e) 

This conjecture can be proven by the following sequence of tactic applications: 

subst axl; subst ax2; rule p'; rule h' 

and can be accounted for in generalised proof strategy. It is important to note 
that 
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— The assumptions are named differently so use of e.g. p instead of class P (or 
h instead of class H) will fail. 

— The initial strategy will not work since there is a different number of rule 
applications. 

— Naive loop applications will not work. For example, axl is applicable after 
the first step, but more application will cause the rest of the strategy to 
fail. However, the goal types (GT2' / GT3) ensure the goal is (correctly) sent 
down the GT3 edge. 

6 Related Work 

We extend [5J, which introduces the underlying strategy language, by developing 
a theory for goal types which we show form a lattice, and using this property to 
develop techniques for generalising strategies. 

Our goal types can be seen as a lightweight implementation of pre/post- 
condition used in proof planning [J] - with the additional property that the 
language captures the flow of goals. It can be seen as further extending the 
marriage of procedural and declarative approaches to proof strategies |2I11I7| . 
and addressing issues related to goal flow and goal focus highlighted in pQ - for 
a more detailed comparison we refer to [8] . 

The lattice based techniques developed for goal type generalisation is sim- 
ilar to anti unification |20| which generalises two terms into one (with substi- 
tutions back to the original terms). Whilst each feature is primitive, the goal 
type has several dimensions. More expressive class/link features, which is future 
work, may require higher-order anti-unification [16] - and such ideas may also 
be applicable to graph generalisations. Other work that may become relevant 
for our techniques are graph abstractions/transformations used in algorithmic 
heap-based program verification techniques, such as [3], and for parallelisation 
of functional programs [10] . 

As already discussed, the problem when ignoring goal information, is that 
one cannot describe e.g. where to send a goal or when to terminate a loop, in a 
way sufficiently abstract to capture a large class of proofs. Instead, often crude, 
heuristics have to be used in the underlying tactic language. This is the case for 
[6], which uses a regular expression language (close to LCF tactics), originally 
developed in |12] to learn proof plans. |12j further claims that explanation based 
generalisation (EBG)QIj] is applied to derive pre/post-conditions, but no details 
of this are provided. An EBG approach is also applied to generalise Isabelle 
proof terms into more generic theorems in [13] . This could provide an alternative 
starting point for us, however, one may argue that much of the user intent will 
be lost by working in the low-level proof term representation. Further, note 
that our work focuses on proof of conjectures which requires structure, meaning 
machine learning techniques - such as [52], which learns heuristics to select 
relevant axioms/rules for automated provers - are not sufficient. However, in 
[D], an approach to combine essentially our techniques, with more probabilistic 
techniques to cluster interactive proofs [15] . was outlined. 
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We would also like to utilise work on proof and proof script refactoring |24| . 
This could be achieved either as a pre-processing step, or by porting these tech- 
niques to our graph based language. Finally, albeit for source code, [T5] argues 
for the use of graphs to perform refactorings, which further justifies our graph 
based representation of proof strategies for the work presented here. 

7 Conclusion &z Future Work 

In this paper we have introduced a theory of goal types for a graphical proof 
strategy language, and limited it to a small set of features, we have shown that 
goal types form a lattice. We then utilised this to develop a theory of sub-typing, 
which we exploited to generalise proof strategies in order to reuse the same strat- 
egy across a class of conjectures. We have further shows how to evaluate/replay 
a (generalised) strategy, and have illustrated with examples throughout. 

Next, we plan to incorporate sub- typing in the underlying theory of the 
language in order to utilise it when composing graphs. We may also need to 
develop a more theoretical notion of graph generalisation, which will be less 
restrictive than rewriting and may not require boundary preservation. 

We are also in the process of developing and incorporating more class and 
link properties, including less syntactic ones (e.g. the origin of a goal, or if it is 
in a decidable sub-logic) - and other techniques for generalising goal types and 
strategies. This will require the development of heuristics to guide the gener- 
alisation since there will be many possibilities. We are also planning to apply 
the techniques to extract strategies from a corpus of proofs. Here we believe we 
have a much better chance of finding and generalising common sub-strategies, 
and may also incorporate probabilistic techniques as a pre-filter [S]. Such work 
may help to indicate which class/link features are more common, and can be 
used to improve the generalisation heuristics discussed above. Further, we would 
like to remove the restriction that facts have to be orthogonal, and improve the 
subtyping to handle this case. 

We only briefly discussed the process of turning proofs into initial low-level 
proof strategy graphs. With partners on the AI4FM project ( |www. ai4fm. org) 
we are working on utilising their work on capturing the full proof process, where 
the user may (interactively) highlight the key features of a proof (step) [23] . This 
can further help the generalisation heuristics. 

Finally, a prototype of the language described in [5] has been implemented 
by combining Isabelle with the Quantomatic graph rewriting engine |14j . This 
will form the basis of implementing fully the work described here. 
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